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Q\ • Abstract 

Detecting and locating changes in highly multivariate data is a major concern in several 
^H , current statistical applications. In this context, the first contribution of the paper is a novel 

^Q ■ non-parametric two-sample homogeneity test for multivariate data based on the well-known 

Wilcoxon rank statistic. The proposed two-sample homogeneity test statistic can be extended 
to deal with ordinal or censored data as well as to test for the homogeneity of more than two 
samples. We also provide a detailed analysis of the power of the proposed test statistic (in 
the two sample case) against asymptotic local shift alternatives. The second contribution of 
the paper concerns the use of the proposed test statistic to perform retrospective change-point 
detection. It is first shown that the approach is computationally feasible even when looking 
^ ■ for a large number of change-points thanks to the use of dynamic programming. Computable 

asymptotic p-values for the test are available in the case where a single potential change-point 
C*^ ■ is to be detected. The proposed approach is particularly recommendable in situations where 

0^ I the correlations between the coordinates of the data are moderate, the marginal distributions 

are not well modelled by usual parametric assumptions (e.g., in the presence of outliers) and 
t~^ ■ when faced with highly variable change patterns, for instance, if the potential changes only 

^D , affect subsets of the coordinates of the data. 



1 Introduction 



Detection and location of distributional changes in data is a major statistical challenge that 
C^ \ arises in many different contexts. This very general concern can be particularised to more spe- 

cific tasks such as segmentation, novelty detection or significance tests. In this contribution, we 
focus on two types of problems: homogeneity testing, where the statistician is presented with pre- 
specified groupings of the data that are believed to be comparable, and change-point detection, 
in which a series -most often, a time series- is to be segmented into homogeneous contiguous 
regions. These two tasks are obviously related but the latter is more challenging as the appro- 
priate groupings of the data are unknown, although one does have the strong prior assumption 
that homogeneous regions of the data are contiguous. Homogeneity testing and/or change- 
point detection are instrumental in applicatio ns that range from th e surveillance of indus- 
trial processes (^Basseville and Nikif orovL Il993|) , to computer security fTartakovskv et all bood 



Levy-Leduc an d Roueff, 200 9), processing of audiovisual data (Desobry et al ., 2005|), financial 



and econometric modelline (IBai and FerronI, 120031; IXalih and Hengartaiei , 2005), health monitor- 



ing terodsky and DarkhovskyI, 12000 0, or bioinformatics IPicard et al.l bOOSt IVert and Bleakley , 



2010 ). 



In light of the important available literature on change-point detection it is important to 
make two additional distinctions. First, in many cases the data to be analysed can be assumed 
to present some form of global reproducibility and to include several instances of actual changes. 
In this case, it seems reasonable to fit a model to the data to profit from the available statistical 



inforination regarding various relevant aspects of the problem such as the distribution of the 
data in the absence of change, the typical change-point patterns, etc. In such situations, very 
convincing results have been demonstrated using Bayesian approaches due to the existence 
of efficient com putational methods to explore the posterior distribution, even when using very 
flexible models (Barry and Hartiganl,ll992l:lFearnhead!, 2006). In contrast, in this contribution, we 
consider scenarios in which the data are either scarce or very variable or where potential changes 
occur somewhat infrequently. In this alternative context, the goal is to develop approaches that 
make as few assumptions as possible regarding the underlying distribution of the data or the 
nature of the changes and that do not rely on the observation of actual change patterns. The 
second important distinction is that many works in t he time series literature consider the online 
change-point detection framew^ork (iSiegmundl, Il985l) in which the data have to be processed 
on -the-fl y or with minimal delay using for instance the CUSUM algorithm initially proposed 
by IPagd 11954). In the following, we consider the opposite situation, sometimes referred to as 
retrospective analysis, in which all the data to be tested have been recorded and are available for 
analysis. 

In this context, the first contribution of this work consists in novel homogeneity tests for deal- 
ing with possibly high-dimensional multivariate observations. We focus on situations where the 
potential changes are believed to have a strong impact on the mean but where the distribution 
of the data is otherwise mostly unknown. For scalar observations, there are well-known ro- 
bust solutions for testing h omogeneity in th is context such as the Wilcoxon/Mann-Whitney or 
Kruskal-Wallis procedures JLehmannL Il975|) to be further discussed below. For multivariate ob- 
servations, the situation is far more challenging as one would like to achieve robustness with 
respect both to the form of the marginal distributions and to the existence of correlations (or 
other form of depende nce) between cq o rdutates. The latter aspect has been add r essed in a se- 
ries of works by Mottonen et alj l|l997|) : iHet tmansperger et al.| l|l998|) : 1^ (|l999|) : lTbpchii et aL 



(|2003,) who studied multivariate extensions of sign and rank tests. These tests are affine invari 
ant in the sense that they behave similarly to Hotelling's T^ test (see Section l23l below) — which 
is optimal for Gaussian distributions- for general classes of multivariate distributions having 
ellipsoidal contours (e.g., for multivariate t distributions). Another promi sing approach inves- 
tigated b y several recent works corisists i n using kernel-based methods (Desobry et al.L 12005 : 
iGretton et al.l. l2006c iHarchaoui et al.l.l2008|) . In our experience however, these methods that can 
achieve impressive results for moderately multidimensional data or in specific situations {e.g., 
if the data lie on a low-dimensional manifold) lack robustness when moving to larger dimen- 
sions. In particular, as illustrated in Section l4H below. kernel-based methods are not robust with 
respect to the presence of contaminating noise and to the fact that the changes to be detected 
may only affect a subset of the components of the high-dimensional data. The latter scenario 
is of very important practical significance in applications where the data to be analysed consist 
in e xhaustive recordings of c omplex situations that are only partly affected by possible changes 
(seel Levy-Leduc and Rouefa 1 12009,) for an example regarding the detection of computer attacks). 
The method proposed in this work is based on a co mbination of marginal rank statistics, fol- 
lowing the pioneering idea of I Wei and LachinI (|l984 ). Compared to the latter, our contribution 
is twofold: first, we sh ow" how to correct the bias that appears in the test statistic proposed by 



Wei and LachinI (|1984)1 whenever the two samples are not balanced in size; we then show how 



to extend this idea to the case of more than two groups. The numerical simulation presented in 
Section m confirms that the proposed test statistic is significantly more robust than kernel-based 
methods or approaches based on least-squares or Hotelling's T statistics. These empirical ob- 
servations are also supported by the results detailed in Section 12.31 regarding the power of the 
proposed test statistic (in the tw^o sample case) with respect to asymptotic local shift alternatives. 
In particular, although the proposed test is not strictly affine invariano it compares favourably 



iJt is highly unlikely that there exist statistics which are both invariant under monotonic transformations of the 
marginals -as the proposed method is- and affine invariant. 



to Hotelling's T^ statistic for important classes of distributions under the assumption that the 
condition number of the correlation matrix of the data is not too large. 

We then consider the use of the proposed approach for change-point detection by optimis- 
ing the test statistic over the -now considered unknown- positions of the segment boundaries. 
Although simple this idea raises two type of difficulties. The first one is computational as 
the resulting optimisation task is combinatorial and cannot be solved by brute force enumer- 
ation when there is more than one change-point (that is, two segments). In the literatu r e this 
issue has been previously tackled either using dynamic programming ( Bai and PerronI, 20031: 



Harchaoui and Cappe, l2007i) o r more recently using Lasso- type penalties fHarchaou i and Levy-Leduc , 



2010; Vert and Bleakleyi l2010|) . We show that the generic dynamic programming strategy is ap- 
plicable to the proposed test statistic making it practically suitable for retrospective detection 
of multiple change-points. The second difficulty is statistical as the optimisation with respect 
to the change-point locations modifies the distribution of the test statistics. Thus, the design 
of quantitative criterions for assessing the significance of the test is a challenging problem in 
this context. This issue h as been con sidered before, mostl y in the case of a single change-point. 



for various test statistics llCsorgo and Horvath.. 1997; .Chen and Guptal, 12000) . In many cases the 



asymptotic distribution of the test remains hard to characterise and must be calibrated using 
Monte Carlo simulations. We show that for a simple modification of the proposed rank-based 
statistic, one can indeed obtain computable asymptotic p-values that can be used to assess the 
significance of the test when looking for a single change-point. 

The paper is organised as follows. Section |2] is devoted to homogeneity testing, starting 
first with the tw^o-sample case and then considering the more general situation where several 
predefined groups are available. The end of Section|2]is dedicated to the study of the asymptotic 
behaviour of the two-sample homogeneity test under local shift alternatives. In Section |3l the 
proposed test statistic is modified to provide a method for detecting and locating change-points, 
with the computation of p-values (in the single change-point case) being discussed in Section |3l2l 
The results of numerical experiments carried out both on simulated and on real data are then 
reported in Section ID 

2 Testing for Homogeneity 

We first tackle in this Section the so-called tw^o-sample problem, that is testing the homogeneity 
between two partitions of data. The proposed test statistic is then extended in Section 12.21 to 
deal with more than two groups of data. 

2.1 Two-sample homogeneity test 

Consider n X-dimensional multivariate observations (Xi,...,X„) and denote by X,j- the fcth 
coordinate of X,, such that X, = (X, i, . . .,X,j^)', where the prime is used to denote transpo- 
sition. We consider the classical statistic test framework with the null (or baseline) hypoth- 
esis, (Hq): "(Xi, ...,X„) are identically distributed random vectors", and the alternative hy- 
pothesis, (Hi): "(Xi,. . .,X„j) are distributed under Pj and (X„j+i, . . .,X„) under P2, with 
Pj 7^ P2". In this setting, the potential change point rii is assumed to be given but the data 
distributions are fully unspecified both under (Hq) and (Hi). The proposed test statistic ex- 
tends the well-known Wilcoxon/Mann-Whitney rank-based criterion to multivariate data by 
considering the asymptotic joint behaviour of the rank statistics that can be computed from 
each coordinate of the observations. For k in {1, . . . ,K}, define the vector-valued statistic 
U„(ni) = (LJ„,i(ni),...,!J„,K(«i))'by 

Un,k{ni) = , / E E |l(^a < H^) - 1(Xm < X,-fc)} . (1) 



^Jnni{n-ni) i^^j^ 



Hl+l 



Although the form of the statistic given above is more appropriate for mathematical analysis 
as well as for discussing possible generalisations of the approach (see Section [2.1.21 below), it 
is important to realise that Lf,, jt('^l) is related to the classical Wilcoxon/Mann-Whitney statistic 

computed from the series Xj j., . . . , X„ j-. Assuming that there are no ties in the data, let R- 
denote the rank of Xj j- among (Xj z^, . . . , X„^](), that is, R^' = E/Li 1(X,;jc < ^;,/c)- Noticing that 
Ej-Li ^1 = "(" + l)/2, it is then easily verified that !i„,j:(ni) can be equivalently defined as 

Vnni(M-ni) ;^i V 2 / ^nni{n - rii) j^„^+-^\ ' 2 J 

This alternative form of U„ jt(wi) is more appropriate for computational purposes as discussed in 
Section [2. l.ll below. For convenience, w^e denote by F„ j-(f) = n^^ '^1=1 '^i-^j,k — the empirical 
cumulative distribution function (c.d.f. in short) of the fcth coordinate, such that F„j^{Xj]^) = 
R^ 'In. Let £,1 denote the X-dimensional empirical covariance matrix defined by 

K»' = - E{F,a(Xyc) - i/2}{F„,,KXa') - 1/2}, \<kM <^. (3) 

The test statistic that we propose for assessing the presence of a potential change in n\ is defined 
as 

S„(«i)=U„(«i)'£-iu„(ni). (4) 

Theorem[T]below (proved in AppendixEJ gives the limiting behaviour of the test statistic S„(ni) 
under the null hypothesis. 

Theorem 1 Let Xj,. . .,X„j,X„j+i, . . .,X„ he "^-valued i.i.d. random vectors, such that for all k in 
{1, ...,X}, the cumulative distribution function F^- 0/ X^ j- is a continuous function. Assume that 
n-i/n -^ ti in (0,1) as n tends to infinity, and that the K x K covariance matrix E defined by 



-tt' 



4 Cov (F,(Xi,,); F,, (Xi,,,)) ,'^<Kk' <K, (5) 



is positive definite. Then, the test statistic S„(«i) defined in (|D converges in distribution to a x^ distri- 
bution with K degrees of freedom. 

Theorem [1] shows that the proposed test is well normalised with respect to the dimension 
K, the length n of the data and the postulated change-point location n^. It is asymptotically 
distribution-free in the sense that its limiting behaviour under (Hq) does not depend on the 
distribution of the data. By construction, it is also invariant under any monotonic transformation 
of the coordinates of X,-. 

The matrix Z, which corresponds to the asymptotic covariance matrix of the vec tor \J„(n-\) 



is equal, up to a rnuitipii cative constant, to me bpearman correlation matrix ot A, qLenmann , 
1975l:lvan der Vaartl.ll998|) . This is a well-known robust measure of dependence that appears in 



particular when using copula models. A sufficient condition for ensuring the invertibility of IL 
is thus that no linear combination of the Fjt(Xi jt)'s should be almost surely equal to a constant, 
which is arguably a very weak condition. It is easily checked that the diagonal elements of E 
are all equal to 1/3 and that IL;.f = E^;. = whenever the fc-th and i?-th coordinates of X, are 
independent. It appears, in practice, that the diagonal elements of £„ converge very rapidly to 
1/3 value and we did not observe any significant improvement when trying to take into account 
this fact when estimating H. 

Theorem [1] defines the asymptotic false alarm rate associated with the test statistic Snin-[). 
The test is consistent (i.e., its power tends to 1) for all alternatives such that the condition en- 
suring the consistency of the standard Wilcoxon/Mann-Whitney two-sample test holds true for 



at least one coordinate. More formally, the proposed test is consistent when there exists k in 
{!,. . .,K} such that P(Xj-i < Xj;„) 7^ 1/2. In the scalar case, this condition is known to hold 
for general classes of changes such as shift (change-in-the-mean) models or scale (multiplica- 
tive) change for positive variables. We defer to Section |3] a more detailed investigation of the 
asymptotic power of the test in the case of multivariate shift alternatives. 

As Theorem [1] is an asymptotic result, we have carried out Monte Carlo simulations to 
asses the accuracy of the approximation for finite sample sizes. Using data with independent 
coordinateqj we found that the distribution of S„{ni) defined in (|4]l can be considered close 
enough to the limiting distribution, as measured by the Kolmogorov-Smirnov test at level 1%, 
when n is at least 8 times larger than K. For instance, for K = 20, a value of n = 210 was 
sufficient; K = 100 required n = 840 samples, etc. The empirical density of the test statistics is 
illustrated in the upper part of Figure[T]when X = 10 and n = 200. 
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Figure 1: Histograms of the statistics S„(ni) (top) and Wei and Lachin's (bottom) compared 
to the xjc P-d-f-, as a function of the ratio rii/n. Data corresponds to n = 200 samples of a 
ten-dimensional standard Gaussian distribution. 



The pioneering work of IWei and LachinI (|l984|) describes a result analogous to that of The- 
orem [T] in the case of possibly upper-censored data. How^ever, the proof technique used by 
Wei and LachinI (Il984|) relies on a different interpretation of Z which is used to derive a weight- 
ing matrix that is not equal to £„ as defined in (|3|. In contrast, our proof (see Appendix |Aj 
is based on a standard argument for U-statistics (the Hoeffding decomposition) that directly 
returns an expression of E in terms of covarian ces, fo r which usual estimators, such as £„, may 
be used. The test statistics of .Wei and LachinI ( 1984 ) thus differs from S„(ni) and turns out to 
be biased in cases where n^ 7^ n/2, i.e., when the change does not occur in the centre of the 
observation frame, as shown by the bottom part of Figure [l] This bias becomes problematic 
when the potential change location «i is unknown because the values of S„(«i) for different 
values of n^ cannot be validly compared. 



2.1.1 Implementation Issues 

As noted above, the vector (!i„^j:(«i))i<i:<K should be computed from the marginal rank statis- 
tics in the form given in (|2j. £„ also is a simple function of those marginal ranks. Thus, 
{Un,k{'^i))i<k<K can be computed in [Kn\og[n)) operations using a sort for computing the 



^Note that by construction, ttie test statistic is then fully invariant with respect to the precise distribution used for 
the Monte Carlo simulations. 



ranks, as the average numerical complexity of usual sorting algorithms is of the order of n log(n) 
operations. The computation of £„ then requires (K^n) operations and its inversion (K^) opera- 
tions. Note that if the test statistic needs to be recomputed at a neighbouring index, say ni + 1, 
neither the ranks nor £„ and its inverse need to be recomputed. Hence the number of additional 
operations required to compute S„(ni + 1) is indeed very limited. 

In some situations, it may happen that th e empirical estimate £ n becomes ill-conditioned 



rendering its inversion numerically unstable. IWei and LachinI (|l984l) suggested to circumvent 



the problem by adding some small positive value to the diagonal elements of £,„ . It is important 
to realise however that a particular case where Z, itself can be ill-conditioned is when coordinates 
of Xj are strongly dependent. In the limiting case where coordinates of Xj, say two of them for 
illustration, are duplicated, E becomes a matrix of rank X — 1 . In such a case, the correct statistic 
is obtained by simply discarding one of the coordinates that are duplicated. Hence, to regularise 
£„ in such cases, we suggest inverting it using its Moore-Penrose pseudo inverse: if £„ = USU' 
denotes the singular value decomposition of £„, with S = diag(si,. . .,sk) being the diagonal 
matrix of eigenvalues of 2L, then the pseudo inverse Zj'j is defined as U' diag(s|, . . . , s|-) LT where 
st = s^ l{si > e) and e is a fixed positive threshold. Instead of relying on the asymptotic result 
of Theorem [TJ it is suggested to compare Sn{ni) to the quantiles of the x^' distribution, where 
K' is the number of non-null values among the s['s. As already mentioned however, some terms 
of £,! appear to converge very rapidly and the matrix is only very rarely ill-conditioned, even 
when n is only slightly larger than K. On the other hand, the regularised variant described 
above was found to be effective for dealing with signals whose coordinates can be extremely 
dependent, e.g., if there is a quasi-deterministic relationship between two coordinates. 

2.1.2 Discrete, missing or censored data 

Theorem [1] requires the continuity of the c.d.f . F^- of each coordinate; hence it is not directly 
applicable, for instance, to discrete variables. In such cases however. Theorem [T] is still valid 
upon redefining Z, as 



Zjtjt' — E 



{F,(X-,) + F,(Xi,,) - 1}{F,,(X-,,) + F,,(Xi,,,) - 1} , (6) 



where Fj.(x ) denotes the left-limit of the c.d.f. in x. In this case, ^ has to be replaced by 

V""l("-«l)/=«i+l I 2 J 

Another useful extension concerns the case of censored or missing data that can be dealt 
with in great generality by introducing lower X, ;, and upper X, /^ censoring values such that 
Xj J. < X, jt < X, jt, where a strict inequality indicates censoring (for missing values, simply set 
X,j. = —00 and X,j- = +oo). In this case, we define a modified statistics from the censoring 
bounds Xj j. and X,- j- by 

Un,k{ni) = , / . E E {W,k < ^j,k) - 1(% < X/,)c) } • (7) 



It can be shown, adapting the arguments of iLung-Yut-Fong et alj l |2012|) who studied the use of 



the scalar statistic (0 for change-point detection, that Theorem [T] then holds with 



^tt' — E 



{F/c(Xu) +£t(Xu) - 1}{Ff(Xu') +&'(Xu') - 1} 



(8) 



where T^. and Fj. denote the c.d.f.'s of Xj j^ and Xi,yt' respectively. 



2.2 Testing homogeneity within several groups of data 

In this section, the procedure presented so far is extended to deal with more than two groups of 
multivariate data. The resulting test statistic is again based on a proper combination of marginal 
statistics involved in the Kruskal-Wallis procedure that generalises the classical Wilcoxon-rank 
test when there are more than two groups of data. 

Consider the null hypothesis that L given groups, Xi, . . . , X„^; X„j+i, . . . , X,,^; . ..; X„^^_^^i, . . . , X„j^, 
share the same distribution, where we shall use the convention that Mq = and «£ = n. 
For ; in {!,...,«} and /c in {!,..., K}, denote as previously by R- the rank of X; /t among 

(Xi i;, . . . , X„ /(■) that is, R' = EjLj l{x i<X',} ■ For £ in {0, . . . , L — 1}, define the average rank in 

group £ for the ^th coordinate by R^ — (n^+i — w^)"^ Y^l^^.j^i R) ■ Consider the following test 
statistic: 

4 ^-1 
Tim, . . . , "L-i) = -2 E ("m - ni)K ^n ^f , (9) 

where the vector Rf is defined as R( = {R^ ' — {n + 1) /2, . . . , K), — (n + 1) /2)', and Z,„ is again 
the matrix defined in ||3j. Theorem |2l proved in Appendix |Bl describes the limiting behaviour of 
the test statistic T{ni, . . . , «l-i) under the null hypothesis. 

Theorem 2 Assume that (X/)i<;<„ are ^^-valued i.i.d. random vectors such that, for all k, the c.d.f. Fi^ 
of^i,k is a continuous function. Assume also that for £ = 0,. . .,L — 1, there exists ff+j in (0,1) such 
that (n^+i — nf)/n -^ tf^i, as n tends to infinity. Then, T{ni, . . . , wl-i) defined in Q satisfies 

T(«i,...,ni.-i)^A:^((i^-l)^) ,flsn^c«, (10) 

where d denotes convergence in distribution and x^{{L — l)K) is the chi-square distribution with (L — 
1)K degrees of freedom. 

Observe that lO extends the classical Kruskal-Wallis test used for univariate observations to 
the multivariate setting. Indeed, when K = 1, ^ is equivalent to 

T(ni «i,-i) = ^E ("W - ni) (r^ ' - (« + l)/2)' , (11) 

where we have replaced £,,11 by En = 4Var(Fi(Xi i)) = 4Var(L7) = 1/3 (Lf denoting a uniform 
random variable on [0,1]). In the case where there is only one change-point, i.e., when L = 2, 
^ reduces to the test statistic proposed in Section l27ll Indeed, using (|2|, T{ni) can be rewritten 
as follows 

^ ""i(»-»i) u„(nO-t.U„(nO + "';f-"\) u„(.0%,U„(nO 
n^ni n^{n — ni) 

= Un(ni)'l.„U„(ni) = S„{ni) , 
where S„(«i) is defined in ||4)|. 

2.3 Power of the homogeneity test in the two sample case 

In this section, w^e focus on the two sample homogeneity test statistic S„(ni) defined in ||4| 
using local alternatives consisting of multivariate shifts to investigate the statistical power of 
the proposed approach. We consider the following alternative hypotheses (Hi „): "(Xi, . . . , X„j ) 
are i.i.d with common multivariate p.d.f. /(x) and (X„j+i, . . . ,X„) are i.i.d with density /(x — 
5 1 \pn)" , where / is symmetric, positive, and, continuously differentiable and 5 denotes an 



arbitrary shift vector in R*^. Our result s ca n thus be compared directly with those obtained by 
JHettmansperger et al. (19981), lOjal l|l999|) and lTopchii et alj l|2003|) for affine invariant multivariate 



generalisations of rank tests. 

We first recall the classical result pertaining to the Hotelling-T^ test 

H„(«i) = (ni(« - «i)/«)(Xni - Xn-nJ'QHxni - Xn-nj) , 

where x„^ — (I]"!^ X,)/ni, Xn-m = (E/Lh +i ^i)/(w ~ "i)/ ^J^d, C„ is the empirical covariance 
matrix of the X/'s. Under (Hi„), assuming that n tends to infinity with rii/n — > fi and under 

appropriate moment conditions, it holds that H„(«i) — > XKi^ni^))/ where 

dH{S) = h{l-h)S'C-'S, (12) 

Xf^id) denoting the non-central chi-squared distribution with K degrees o f freedom an d non- 
centrality parameter d and C denoting the covariance matrix of the X/'s, see iBickell lll965|) . 



Let f (Xj) = (Fi(X,i),. . ■ , Fi^{Xif(^))' denote the _K-dimensional vector of marginal distribu- 
tion functions and Vlog/(X,) the score function. The following theorem (proved in Appendix 
let establishes the asymptotic behaviour of S„(«i) under (Hi „). 

Theorem 3 Assume that Xi,. . .,X„j,X„j+i,. . .,X„ are M.^-valued random vectors distributed under 
(Hi^„) and that the Kx K covariance matrix E defined by (|5j is positive definite. Assume also that the 
Fisher information matrix It = E/[Vlog/(Xi)Vlog/(Xi)'] is finite and that the densities /j- o/Xj j. 
are upper bounded for all k. Then, as n tends to infinity with n^/n -^ ti & (0, 1), 

S„(«i) A xl (4ti(l - t,)S'A'i:-'AS) = xlidsiS)) , (13) 

where E is defined in ^andA = E^[(F(Xi) - l/2)Vlog/(Xi)']. 

The following corollary, which is proved in Appendix |D1 particularises this results to the 
case where the coordinates of the observations are independent. 

Corollary 1 Assume that the density f may be written as the product of its marginals, f{\) = nf=i fk i^k) 
Then, 

dniS) = fi(l - fi) E ^ ""'^ ds{S) = 12fi(l - fi) E -^4 ' 

k=l ^t /c=l ^t 

where c^l = J x'^fj^{x)dx and A/^ = J {o'kfki'^k^)) '^^• 

Moreover, d^{S) > (108/125)rfH(<^)/ ^^^th ds{S) being equal to [3 / n)d}j{5) when /j- are Gaussian 
densities. 

Corollary [1] states that the so-called asymptotic relative efficiency (ARE) with respect to the 
Hotelling-T^ test ds{5)/du{5) is lower boimded by 108/125 k. 0.86. Granted that this result 
holds irrespectively of the choice of the marginals f^ this is a strong guarantee that extends 
the well-known scalar result. As will be illustrated in Section 14.2. 2[ the ARE is much larger 
whenever some of the marginal have strong tails. In the particular case of Gaussian densities, 
the ARE of the proposed test is equal to 3 /ttw 0^95, which is comparable to the values obtained 



for affine invariant multivariate rank tests (! Ojal,ll999L p. 331). 



For the particular case of multivariate Gaussian distributions, an explicit expression of the 
ARE can be obtained without assuming independence, as shown by the following corollary 
(proved in Appendix |E|. 

Corollary 2 Assume that f is a multivariate Gaussian p.d.f. with mean and covariance matrix C, 
then the matrices A and IL featured in Theorem \3\are given by A = {2y/Tz)~^ diag(c7'j~ ,.. .,cr^ ) and 
Eji-^ = 2/7rarcsin(Cjt^/(2c7'ji-c7'^)),/or 1 < k,£ < K, where {a^)i^]^<f^ denote the diagonal elements of C. 
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Furthermore, the asymptotic relative efficiency of the test statistic Sn{n{) can he lower hounded as 
follows 

ds/{S)du{S) > (3/7r)(c7-^/c7^,J(A^i,(C)/A„,ax(|C|)), 

where i7^j^ and cr^ax denote, respectively, the minimal and maximal diagonal terms ofC, Acinic) is the 
minimal eigenvalue of C and A,„ax{\C\) the maximal eigenvalue of \C\ = i\Ck/\)i<k,i<K- 



We observe here an important difference with the test statistic of iMottonen et alj (|l997l) which 



is designed so as to guarantee that its ARE in the multivariate Gaussian case does not depend 
on the value of C. For the proposed statistic, the ARE is low^er bounded by the minimal eigen- 
value of A'T^^^AC, where A and E are defined in Corollary ID Recall that E is proportional to 
the Spearman correlation matrix whereas Corollary |2] implies that AC A' is proportional to the 
standard correlation matrix. Empirically, it can be checked using numerical simulation that the 
minimal eigenvalue of A' 11^^ AC can only be small when C itself is poorly conditioned. The 
second statement of Corollary |2] substantiates this claim by providing a lower bound which, for 
positively correlated C at least, is inversely proportional to the condition number of C. Note 
that this bound appears to be pessimistic in practice as for values of X in the range 2 to 6, we 
observed the ARE to be larger than 0.8 for all matrices with condition number smaller than 10 
(recall that the ARE is equal to 0.95 in the scalar case); for matrices with condition number up 
to 100, the ARE was still larger than 0.3. Hence, in situations where the coordinates of the data 
are not too correlated, despite the fact that the proposed test is not affine invariant, its loss with 
respect to the (optimal) likelihood ratio test in the Gaussian case is usually negligible, a fact that 
will be illustrated by the numerical simulations of Section H) 

3 Change-point estimation and detection 

We now consider the setting in which the position of the potential change-points are un- 
known (still assuming that their number is known). Our proposal is to consider the statistic 
r(«i, . . . , ml-i) described in the previous section and to optimise it over all the possible change 
point locations. This proposal is however faced with two serious difficulties. The first one, 
which is of computational nature is related to the feasibility of the maximisation when there is 
more than a single change-point. We start by showing that the maximisation of T{ni, . . . , ni-i) 
is amenable to dynamic programming and stays feasible even when L is large. The second 
difficulty, to which a partial answer is provided in Section [3.2[ is statistical and concerns the 
interpretation of the value of T{ni, . . . ,ni^i) as optimising with respect to the change-point 
location obviously modifies the distribution of the values of the test statistic. This is a difficult 
issue in general, but we show how to obtain meaningful and simple-to-compute p-values for a 
variant of the test in the case of a single change-point. 

3.1 Multiple change-point estimation 

Assuming a known number of change-points L, w^e propose to use the test statistic described in 
Section |Z21 to determine the positions of the segment boundaries «i, . . . , «l-i- These unknown 
change-point locations are estimated by maximising the statistic T{ni,. . .,ni_i) defined in (|9j 
with respect to n^, . . ., «l-i- 

(ni,...,nL-i) = argmax T{ni,. . .,ni_i) . (14) 

In practice, direct maximisation by enumeration in llMl l is computationally prohibitive as it 
corresponds to a combinatorial task whose complexity grows exponentially with L. However, 
due to the fact that the matrix £„ is common to all segments, the statistic r(«i, . . . , n^^i) defined 



in Q has an additive structure which makes it possible to adopt a dynamic programming 
strategy. We refer here to the classical djniamic programming approach to the segmentation 



task which is described in lKavi (1993) use d by, among others, Bai and Perron (2003) and can be 
traced back to the note by lBellmanl (|l96l|) . More precisely, using the notations 



and 

L-l 

hip) = ^ max ^ A(n^ + 1 :«f+i), 

l<«l<---<«L_l<nj,=p ^^Q 

w^e have 

Jl(p)= max {Ji,_i(ni._i)+A(nL-i+l:p)} • (15) 

"L-l 

Thus, for solving the optimisation problem (14)1 , we proceed as follows. We start by computing 
the A(f : ;') for all (/,;') such that 1 < i < j < n. All the /i (E) are thus available for E = 2, . . . , n. 
Then /2(E) is computed by using the recursion i flSt and so on. The overall numerical complexity 
of the procedure is thus proportional to L x n^ only. 

3.2 Assessing the significance of the test in the single change-point case 

In addition to practical algorithms for estimating change-point locations, one needs tools to 
assess the plausibility of the obtained change-point configuration. An important step in that 
direction is to characterise the behaviour of T{ni, . . . , Ml-i) under the null hypothesis that the 
data are indeed fully homogeneous. This is a difficult issue in general due to the optimisation 
over all possible change-point configurations. A possible calibration approach consists in run- 
ning Monte Carlo experiments, possibly using bootstrap techniques if a representative sample 
of the baseline data of interest is available. We show below that in the case where L = 2, i.e., 
when looking for a single potential change-point, it is possible to obtain a simple computable 
approximation to the asymptotic p-value of the test. 

To do so, we consider in the rest of this section a modification of the test statistic used in 
(14)1 . The practical consequences of using this variant rather than the statistic T{n\) when L = 2 
will be discussed after Theorem |4] which states the main result of this section. 

Let Vn(n\) = {Vn,i{n\), ■ ■ •/ K,k(wi))' denote the vector such that 

K,ic(«i) = ^E E {l(X,,<X^,,)-l(Xy;,<X^.)},fc = l,...,X, (16) 

and define 

S„(«i)=V„(ni)'£-iv„(ni). (17) 

Note that V„ only differs from U^, by the normalisation, which is now independent of «i. We 

now consider the statistic 

W„ = max S„(ni) . (18) 

i<«i<«— 1 

The following theorem, proved in Appendix IB gives the asymptotic p-values of W„ under 
the null hypothesis that no change in distribution occurs within the observation data. 

Theorem 4 Assume that (X/)i<;<„ are 'R^-valued i.i.d. random vectors such that, for all k, the c.d.f. Fj^. 
of Xj J. is a continuous function. Further assume that the K x K matrix E defined in (O is invertible. 
Then, 

W„ A sup ( E B^(0 K as n ^ 00 , (19) 

0<f<l \i-=l / 

where d denotes convergence in distribution and {Bj({t), t e (0, l)}i</c<i;c ^''^ independent Broivnian 
bridges. 
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To determ ine the p-value Pval(^n) associated to | [T9l , one can use the following result due to 
Kieferl lll959l) : 




PvAb)='P{ sup 
\0<f<l 

^ ^ _ 4 ~ i7iK-2)/2,Mf~^exp[-{'r^K-2)/2,M)^]/2b 

r(f)2ffofil [/iC/2(7(K-2)/2,m)]' 

where }u is the Bessel function of the first kind, 7v,m is the jw-th nonnegative zero of Jv and F 
is the Gamma function. In practice, only a few terms of the series have to be computed. For 
values of K of forty or less computing the p-values from the thirty first terms of the series was 
sufficient. 

As noted at the beginning of Section [3.21 the normalisation of V„ differs from that of U„, 
resulting in a statistic W„ that does not coincide with T{ni). From our practical experience, re- 
placing V„ by U„ in the definition of W„, that is using T{n-[) instead of W„, produces a statistic 
that has the same detection and localisation capacities when the potential change occurs in the 
central region of the observation window, say between n/4 and 3n/4 observations. For potential 
changes occurring closer to the beginning or to the end of the observation window, T{ni) has an 
enhanced detection power at the expense of a slight increase in the rate of false alarms, with cor- 
responding spurious detections occurring mostly near the borders of the observation window. 
Proceeding as in Appendix|Fl one can prove a result related to Theorem|4]for T(«i) (used when 
L = 2) by imposing some additional conditions on the admissible values of Mi (namely, that the 
maximum is searched only for value of Wi such that tii/n is bounded from above and below). 
The resulting limit, expressed in terms of Bessel processes, does not yield easily computable 
asymptotic p-values, although approximations such as those studied by Estrella ( 2003) could be 
used for approximating extreme quantiles (that is, very low p-values). 

4 Numerical Experiments 

In this section, we report numerical experiments that illustrate different aspects of the methods 
proposed in Sections|2]and|3l A software implementing these methods in Python is available as a 
supplementary material of the paper. For easy reference, the two- or multi-sample homogeneity 
test defined by ^ is referred to as MultiRank-H in the following; the change point estimation 
criterion defined in ((T4|l is referred to as MultiRank. 

4.1 Illustration of the two-sample homogeneity test 

We start by considering the basic two-sample homogeneity test first introduced in Section 12.11 
For this, w^e generate baseline observations distributed as a mixture of two two-dimensional 
Gaussian densities with common mean (0, 0) and diagonal covariance matrices with diagonal 
terms equal to (4,0.2) and (0.2,4), respectively. For the alternative distribution, we generate 
observations having the same characteristics except that the mean is now equal to s = (0.5, 0.5). 
In this case, n = 100 and the two groups (baseline and alternative data) are of length n/2 = 50. 
Figure |2] (a) shows a typical example of the data, represented as a two-dimensional scatter plot. 
MultiRank-H is compared with three oth er approaches . The first is the the Maximum Mean 
Discrepancy (MMD) statistics proposed by iGretton et al.l l|2006) . which is a kernel-based test 
here used with a Gaussian ke rnel having a bandw i dth given by the med ian distance between 
the sa mples as suggested by Grett on et al.l l|2006|) . IPesobry et al. ' (2005) and Harchao ui et al. 



(|2008|) . The second approach is the classical Hotelling's T-^-test dChen and Gupta, 2000, p. 67) 



which is optimal in the multivariate Gaussian case. The third method is to use the likelihood 

11 



ratio (LR) test assuming a known structure for the model whose parameters (mean vectors and 
diagonal terms of the covariance matrices) are estimated using the Expectation-Maximisation 
algorithm. This latter approach is optimal in this context but is the only one that uses some 
knowledge about the distribution of the data. These methods are compared through their ROC 
(Receiver Operating Characteristic) curves, averaged over 1000 Monte Carlo replications of the 
data. 

From the results of Section |231 we know that, as the covariance matrix of the data is propor- 
tional to the identity matrix, the asymptotic performance of Hotelling's T^-test does not depend 
on the particular choice of the shift s, but only on its L^ norm. For the Multirank-H approach 
the answer is less straightforward as the coordinates of the data are not independent in this 
example. However, Monte Carlo estimation of the asymptotic performance index ds{S) in | [T3t 
shows that it does not varies by more than 1% with the direction of S. We indeed verified that in 
the setting of Figure |2] the variation in performance with the direction of the shift s is negligible. 

The results displayed in Figure Efb) show that MultiRank-H is on a par with the LR and 
outperforms the other two approaches. MMD performs somewhat better than Hotelling's T'^ in 
this context due to the non-Gaussian nature of the data. Figure |2lc) corresponds to the more 
difficult setup in which eight-dimensional i.i.d. Gaussian random vectors of variance 2.5'^ are 
appended to the data described previously. In this case, the data are thus ten-dimensional but 
the change only affects two coordinates. MultiRank-H is again comparable to the LR, which is 
still optimal in this case. Note the lack of robustness of MMD which is distinctly dominated by 
Hotelling's T'^ in this second scenario. 
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Figure 2: (a) Example of observations under baseline and alternative distributions, (b) ROC 
curves for MultiRank-H, MMD, Hotelling's T and LR, (c) Same as (b) with eight-dimensional 
Gaussian noise padding (that is, with K = 10). 
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4.2 Properties of the change-point detection test 

In this section, we investigate the properties of the change-point detection test based on the 
statistic T(«i) resulting from the optimisation of flSl l. The simulations reported in this section 
are based on the following common benchmark scenario: under (Hg), that is, in the absence 
of change, we generate 100 samples from a five-dimensional standard Gaussian distribution. 
Under {Hi), the observations are similar, except that the common mean of the Gaussian vector 
changes to 0.3 at a location which is either equal to 1/4 or 1/2 of the observation window, that 
is, at indices 25 or 50. This scenario corresponds to a simple situation where all coordinates 
of the data possibly undergo similar upward shifts. The ROC curves that are plotted in the 
following are based on 2000 replications of the simulated data. 

4.2.1 Comparison with marginal decisions 

The MultiRank test statistic is obviously based on a combination of marginal rank statistics. 
Nevertheless, it incorporates two important aspects of the multivariate change-point detection 
problem: first, detection of simultaneous changes in multiple coordinates should make the pres- 
ence of an actual change-point more likely, and, second, the existence of dependence between 
the coordinates should influence the decision. To illustrate these observations, we compare 
MultiRank with a simpler heuristic approach that combines marginal decisions based on Bon- 
ferroni bound, using as test statistic maxi<j.<Kmaxi<„j<„_i ^n,k{'^l)- The results obtained with 
the data-generating mechanism described at the beginning of Section 14.21 are displayed in the 
leftmost plot of Figure |3l We also compare both approaches in a setting where the covari- 
ance matrix of the Gaussian vector is not the identity matrix anymore but a tridiagonal matrix 
with a common value of 0.45 (positive correlation) or —0.45 (negative correlation) on the sub- 
and super-diagonal. The resulting ROC curves are displayed in the middle and right plots of 
Figure |3l respectively. 

The leftmost plot of Figure |3] shows that the approach that combines the marginal statistics 
by taking into account their correlation, that is MultiRank, outperforms the Bonferroni-type 
approach. Furthermore, when the coordinates are positively correlated, the rate of detection of 
the MultiRank method decreases for a given false alarm rate and when negatively correlated, 
the rate of detection increases. The performance of the Bonferroni-type approach on the other 
hand does not improve for the negatively correlated data. The MultiRank method captures 
an important feature of the problem that fails to be exploited by the mere marginal decisions: 
negative correlations in the data make the detection of simultaneous upward jumps easier while 
positive correlations render this task more difficult. 

Although Figure |3] deals with change-point estimation (which includes the maximisation 
with respect to the change point position), we note that Corollary|2]of Section lZ3l implies that the 
performance of the Multirank-H homogeneity test is here comparable to that of the Hotelling's 
T^ test as the condition number of the covariance matrices considered in Figures |3] is relatively 
small (less than 8.1). The Multirank change detection test obviously inherits this property as its 
performance is nearly indiscernible from that of the LR test for all three choices of the covariance 
matrix in Figure |3l 
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Figure 3: ROC curves for the MultiRank and the Bonferroni-type approaches when the coordinates are independent (left), positively correlated 
(middle) and negatively correlated (right). The change-point instant is located at 1/4 and 1/2 of the observations window of length 500. 
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Figure 4: ROC curves for the MultiRank approach and the likelihood-ratio procedure (LR) for three different proportions of outliers (from left to 
right: 0, 5 and 20%) when the change-point instant is located at 1/4 and 1/2 of the window of length 500. 



4.2.2 Robustness with respect to outliers 

Here we illustrate the robustness of the MultiRank approach with respect to outliers in the data 
by considering the same simulation scenario as in the previous section (with a larger shift of 
amplitude 0.5) progressively contaminated by large additive outliers. The outliers distribution 
is the multivariate Gaussian distribution with covariance matrix 10 Ids, instead of Ids for the 
baseline distribution (where Ids refers to the five by five identity matrix). The fraction of out- 
liers is varied between 0, 5 and 20%. The MultiRank approach is compared to the parametri c 
likelihood-ratio based change-point detection test described by ISrivastava and Worsleyl (|l986|) . 



This latter method, which is itself based on Hotelling's T^-test statistic, is optimal in the absence 
of outliers as the baseline and alternative distributions are both Gaussian. As shown in the left- 
most plot of Figure m MultiRank has comparable performance with the parametric approach in 
the case where there are no outliers in the data. How^ever, as shown in the middle and rightmost 
plots of Figure m MultiRank demonstrates its robustness with respect to the presence of outliers 
as it barely suffers from the presence of additive outliers contrary to the parametric approach. 

4.3 Application to genomic hybridisation data 

To illustrate the potential of the approach, we consider its application to the segmentation of 
multiple in dividual genomic data. We consider the bladder cancer micro-array aCGH dataset 
studied by V ert and Bleakle y (2010) which consists of records of copy-number variations, i.e. 
abnormal alteration of the quantity of DNA sections. 

The objective here is to jointly segment data recorded from different subjects so as to robustly 
detect regions of frequent deletions or additions of DNA which could be characteristic of cancer. 
Each of the 57 profiles provides the relative quantity of DNA for 2143 probes measured on 
22 chromosomes. We ran the change-point estimation algorithm on each of 22 chromosomes 
separately, thus processing 22 different 9- to 57-dimensional signals (depending on the selected 
groups of patients at different stages of cancer) of length 50 to 200 (the number of probes varies 
for each chromosome). 

In this paper, we have not considered principled methods for inferring the number of change- 
points from the data. We describe below an heuristic approach to determine the number of 
change-points which, despite its simplicity, performs in our experience much better than the 
use of generic penalties such as AIC or BIG. Values of the statistics lh{n), for L = 0, . . . , Lmax/ 
are first computed using the procedure described in Section |3A] The algorithm is based on the 
principle that in the presence of L* > 1 change-points, if /l('^) is plotted against L, the resulting 
graph can be decomposed into two distinct regions: the first one, for L = 0, . . . , L* where the 
criterion is growin g rapidly; and the second one, for L = L*, . . . , Lmax/ where the criterion is 
barely increasing (|LavielleL 120051) . Hence, for each possible value of L in L = 1, . . .,Lmax, we 



compute least square linear regressions for both parts of the graph (before and after L); the 
estimated number of change-points is the value of L that yields the best fit, that is, the value for 
which the sum of the residual sums of squares computed on both parts of the graph is minimal. 
For an illustration of this methodology, see Figure|5] The case L = is treated separately and the 
procedure described above is used only when the value of the test statistic W„ for the presence 
of a single change-point (see Section l3l2] l is significant (p-value smaller than 0.1%) based on 
Theorem HI 

Results are shown for a group of 32 profiles corresponding to Stage T2 of a tumour. In Fig- 
ure |6] the copy-number data and the segmentation of the whole set of chromosomes is displayed 
for two particular individuals, together with the corresponding stepwise constant approxima- 
tions of the data for the 32 individuals (in the bottom of the Figure). Figure [7] details the results 
pertaining to the 7th chromosome. In both cases, the segmentation result is represented by a 
signal which is constant (and equal to the mean of the data) within the detected segments. The 
bottom plot in Figure [6] particularly highlights the fact that several coordinates indeed jump 
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10 12 14 



Figure 5: Determining the optimal number of change-points. Here, the actual number of change- 
points is L* = 4; the optimal regression is displayed in solid lines, while a non-optimal alterna- 
tive (for L = 6) is displayed in dashed lines. 



at the same time, suggesting that the joint segmentation model is appropriate. On the other 
hand, it is also obvious that one cannot assume (see, e.g., the third change-point at index 64 in 
Figure [TJ that all coordinates undergo similar changes. Note also that in this application, the 
fact that the MultiRank test statistic is properly normalised with respect to the length n of the 
data and their dimension K is particularly important: n corresponds to the number of probes 
and varies with each chromosome, K represents the number of individuals and varies when 
considering different groups of subjects. 

As a reference, the group fused Lasso algorithm by lBleakley and Verj (|2011 ) outputs similar 
results. In particular, on the 7th chromosome, change-points are found at positions 21, 44, 65, 
102, 107, 112, 124, 132, 156 and 166. On the whole set of chromosomes, 96 change-points are 
found while the MultiRank estimation procedure outputs 98. 



5 Conclusion 



We proposed an approach for retrospective detection of multiple changes in multivariate data. 
The basic idea, used for homogeneity testing when the data groupings are known, is an exten- 
sion of well-known marginal rank based tests ( Wilcoxon/Man n- Whitney and Kruskal-Wallis) 
based on the idea originally proposed by I Wei and LachinI I l984|) . The use of this approach for 
change-point detection (when the segments boundaries are unknown) was shown to be compu- 
tationally feasible. In addition, it incorporates important aspects of the problem, in particular 
the fact that simultaneous detections in different coordinates make the presence of an actual 
change more likely. The method was shown to be robust against various alternatives and on 
a par with optimal methods in benchmark cases. The approach can also be straightforwardly 
modified to deal with ordinal data, missing or censored values. 

To improve the method, it w^ould be desirable to provide significance levels for the change- 
point detection test when used to detect more than a single potential change-point. We believe 
that by considering the normalisation used to define the statistic W„ in Section 13. 2[ it is possi- 
ble to study the asymptotic behaviour of the change-point detection statistic in the general case. 
This being said, the difference between the two forms of normalisation could be more significant 
when applied to more than two segments. On a different level, one should obviously consider 
more principled approaches for selecting the number of change-points. General-purpose penal- 
isation schemes could be used but we feel that novel ideas need to be developed specifically for 
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Figure 6: First and second row: copy number data for two different individuals with super- 
imposition of the segmentation. Third row: superimposition of the smoothed bladder tumour 
aCGH data for 32 individuals in Stage T2 cancer that result from the segmentation. Vertical 
dashed lines represent the separation between the different chromosomes. 



the change-point problem given the specific nature of over- and under-estimating the number of 
change-points. For instance, in many practical applications the significance of over-estimating 
the number of change-points depends not only on the number of spuri ous segments but also 
on their locations. Traditional appro aches based on complexity pen alties IIBai and Pe rron, 20031), 
Bayesian methods IIFearnheadl. l2006|) and sparsity -based criterions JHarchaoui and Levy-Ledud, 
2010l:IVert and Bleakleyl,l2010|) are already available but there is certainly room for new develop- 
ments in these fields. 



A Appendix: Proof of Theorem |T] 

The proof is based on the Hoeffding decomposition of U„if{n-[) for each k in {1, . . .,K}. For 
fu rther details on the Hoeff ding decomposition, we refer the reader to Chapters 11 and 12 
of Ivan der Vaarj lll998|) . For each k in {1,...,K}, let ^i,jt(y) = Jh{^>y)dFi^{x) and hii^{x) = 
Jh{x,y)dFif{y), where h is defined by h{x,y) = l(x < y) — l{y < x). By the continuity of Fjt, 
/jj jt(y) = ^Fi({y) — 1 and hii({x) = 1 — 2Fj-(x). The Hoeffding decomposition of Lr„ j;(ni) can 
thus be written as !i„ j-("i) = Qn,k{^i) + ^«,fc("i)' where 



0„,fc(Mi) 



Rn,k{ni) = 



ni 



V^Mn-n^) j=„,+i 



E h,k{Xi,k) 






"1 



Vnn 



i[n 



=^E E [HXi,k^Xj,k) 



hk{Xi,k)-h,k{Xj,k)]- 



(21) 



(22) 



We first prove that LZ„i-(wi) = ^n,J:('^l) +Op(l) by showing that Var[]?„ j-('^l)] ^^ 0, as n 
tends to infinity. Using that E[L7„ j-(ni)] = E[!J„ j-(«i)] = 0, we obtain that Var[R„ j.(ni)] = 
Var[LI„,;t(ni) " !J,a(«i)] = E[Lf2fc(ni)] +E[!j2^(ni)] - 2E[U„^k{ni)Un,k{ni)]. By independence 
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Figure 7: Data for 32 individuals in Stage T2 bladder cancer with superimposed segmentation 
for chromosome 7. 10 change-points were estimated and the dashed vertical lines correspond 
to the estimated segment boundaries. 



nn-\ [n — nA . ^—',, 



^ nKkiH^f 



of the (X; j-)i<Kn/ we obtain that 

E[!i„%(ni)] = 

Using that 

EK,c(Xyc)'] = 4]E[(F,(Xi,,) - 1/2)2] ^ 4Var(W) = 1/3 , 

where W has a uniform distribution on [0, 1], we get, on the one hand, that 



^" "i)' tnhki^.Mn (23) 



nni{n-ni) f-[ 



mlM)] 



n\{n-ni) {n-nifni 



3nni(n — «i) 3nni{n — ni) 



1/3. 



(24) 



(25) 
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On the other hand 



nni(«-«i),tt;=nj+i 
1 



^ ^ E[h{Xi^,,Xj^,)h{X,,^,,Xjj,)] 



+ „„,(„_„,) E E ]E[MX,„X,,)MX,>X^^,)]. (26) 

We separately study the three terms of the r.h.s of | |26l l. Using that (X, ji-)i</<» are i.i.d. , we get 

1 ^ « ^^,^^ ^ , ni(n-ni) 



^ ^ E[MX,,,X^yO'] = JV^_ '> [MXu^X„^+u)'] ^ 0, as n ^ a. . 



nni (m - Ml ) ;tt ;=tr+i ' «ni (n - ni ) 

(27) 
Then, by continuity of Fj., we have 

— ^i— -^ E E ]E[Mx,>x,,)Mx.,„x,,)] 

("5 -'"""-«■' /(2fiW-l)(2f.(y)-lW-W^ "''"'.: '.''"-"■' . (28) 



nn\(n — n\) J ' 3nni(n — mj) 

Using similar arguments, the last term of the r.h.s of | [26l l is equal to «i(n — Mi)(m — rii — 
l)/{3nni{n - «i)). With (|27ll and ^, we obtain 

E[Lr2,, («i)] ^ 1/3, as n ^ 00. (29) 

Since E[!J,^.(mi)LJ„ jt(«i)] -^ 1/3, as n -> oo, (|25ll and l|29]l lead to Var[R„;c(«i)] ^> and 
thus U„^if{ni) = ll„i^{ni) + Op{l), as n tends to infinity. The multivariate central limit theorem 
then yields (LI„4(ni), . . ., LZ„^k(mi))' -^ A/'(0,I,) , where the (A:,fc')th entry of I, is given by 
^kk' = lim«-j.ooE[JJ„j-('^i)!Jn,)c'("i)]- Using that the (X/^jt)i</<i; are i.i.d., we obtain that 

4^2 n 

E[Lr„,,(ni)Lr„,,'(«i)] = ^^ , 1 ^ . E E[{F,(X^;fc) - 1/2}{F,,(X^.,,) - 1/2}] 

, 4(n-«a)' |^E[{F,(X,,)-l/2}{F,,(X,„)-l/2}] 



nMi(M-ni) ;^ 

= 4Cov(F,,(Xi,0,Ffc,(Xu')) • 

Thus, Z^^/^(!J„4(ni), . . ., Un^Ki^i))' — ^ A^(0,IdK). Since £„ — > Z, we deduce from Slutsky's 
Theorem that I,„"^'^^(LZ„,i(«i), . . . , LI„,k(wi))' — ^ A/'(0,IdK), which concludes the proof. 

B Appendix: Proof of Theorem |2] 

Using that Rf^ = ELi l(X,yc < Xy;,), we obtain that 

R? - ^ = ' ( E EK^Hc < x,,)^ - ^ 



\j=n(+l i=l 



^ E Efl(X.yc<Xy,^)-l/2l . (30) 
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Let h{x,y) = l(x < y), h^^^iy) = /l(x < y)dFfc(x) and 7t2,fc(x) = /l(x < y)dF,t(y)- By 
continuity of fj^: /Ji,fc(i/) = ft(i/) and h2^k{^) = 1 — Fi:(x). Using the notation: 

<^ = («m - n,f'^/n{Rf -{n + l)/2) , 
the Hoeffding decomposition yields 



7^ 



(/c) _ («f+l - Mf) 



1/2 



+ 



(«m -«£)'/' 



n-l 



''('+1 



M£+i - n^ - 1 " 






E('^2,'t(Xa)-l/2) 



E E UP^.> ^M - Kki^j,^) - h2,kiX,,k) + 1/2} 



'l|f7^g + 7^g + 7^g. (3i) 

Note that TZj, , = Op(l), as n tends to infinity, since it can be proved that Var(72.^ 3 ) ~ Var[72.^ — 
{TZj, -^ + TZ^ 2 )] ^- 0, as « tends to infinity. Thus, ^l\ can be rewritten as 

n-l "*+i 

.M/2 



7^ 



(^) 



"''+1 t, _ M — 1 " 



L-l v-"p+l 



Since ELi(l/2 - Ffc(Xa)) = E;;=i Ey:;;+i(l/2 - F,(Xy,,)), 



7^ 



i ^ .(..., -^M/2 E (F/c(X;,i:)-l/2)- ^^^^^^_^^^i/2 E jLj^kiXj,, - 1 /2)) + 0^(1) 



n{ne+i - n^)i/2 .4^,^^ 



def . 



Observe that, for a fixed £ in {0, . . . , L — 1} and k, k' in {1, . . . , K}, we get, as n tends to infinity, 

ACov{Ukini,ni+i),Uk'ini,ni+i)) = 



^kk' 



2 L-l 



^ («£+! - »£) \ _^y 



p=0 



("^+1 -w^-l)^('-'p+i -Wp) 



^ (1 - t^+OStt-, , (32) 



where we have used that E^g (n»+i — n-p) = n — [n^^i — n^). In the same way, for fixed k,k' in 
{1, . . . , X} and ^ 7^ ^' in {0, . . . , L — 1}, we obtain, as n tends to infinity. 



4Cov(Lr,t(n£/n£+i)/Lr/c'("^''"^'+i)) ^ - Vte+ik^'^kk' ■ 



Let 



^„=2 



("1 -Wo) 



1/2 



R'o 



(ml-«l-i) 



1/2 



-R 



L-l 



We deduce from (|32]|, ||33ll and the multivariate central limit theorem that 



(33) 



Rn -^A/'(O,00Z),«^oo, 
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where Z is the Kx K matrix defined in l|5ll, (3 denotes the Kronecker product, = Id^ — \/t^/t 
with s/l = ( VIT, . . . , v^)'. Thus, 

rJ ^A/'(O,0®IdK),;t^oo, 

where 

Rl = 2 {^'''-'f\-'l^^, ("^-"^^-^^'''e-i/^R-,,] ' . 

Since £„ — ^ 2L, as n tends to infinity, the same convergence holds when Z is replaced by £„. 
Since E^'Zq (w^+i — w^)/« = 1, H^^i 1^ = 1 and the matrix f has eigenvalue of multiplicity 1 
(with eigenspace spanned by y/t), and eigenvalue 1 of multiplicity L — 1. Hence, the eigenvalues 
of Mk are 0, with multiplicity K, and 1, with multiplicity (L — 1)K, which concludes the 
proof using Cochran's theorem. 

C Appendix: Proof of Theorem |3] 

In Appendix [Aj we proved that under the null hypothesis where the X,'s are i.i.d. random 
vectors such that the c.d.f F^- of Xj j. is continuous: 

2mi " 

\/""l(w-«l);=Hl+l 

2(n-«i) ^|p^^(x,fc)-l/2}+op(l),as«^oo. (34) 



yjnni[n-ni) ;^i 

Since, by assumption, the model /(x — Q) is differentiable in quadratic mean at 6, the log- 
likelihood ratio L„ defined by 

'" = '°^ [ i^ijm . 

satisfies the following asymptotic expansion as n tends to infinity 

Ln = —^ E Vlog/(X^)-^i^<^V + °p(l)- (35) 

Combining multivariate central limit theorem with ( |34t and | [35l l one can thus show that (U,;, Ln) 
converges in distribution to a Gaussian random vector. From the expression of the asymptotic 
covariance matrix and using Le Cam's third lemma, one obtains that under (H^ „) 



U„(«i) ^ A6<(2^fi(l - h)ASX) ■ 

Using Glivenko-Cantelli Theorem, the w^eak law of large numbers and the fact that //^ is bounded 
for all k -which implies that Fj. is a Lipschitz function for all k- it can be proved that, under 
{Hi^n)i ^n converges in probability to Z. The conclusion follows using Slutsky's Lemma. 

D Appendix: Proof of Corollary [1] 

Assuming independence, E = 4Eo[{Fi(Xij) — 1/2}^] Id]^ = 1/3 Id^, where Idj^ denotes the 
K X K identity matrix, and A is the diagonal matrix with elements Aj-j. = J^Fi^{x)fl^{x)dx = 
Jj, /^(x)dx = —err /p(p't- />((rji.x))^dx. The lo wer bound on dg is obtained by the classical 



result that 12Af > 108/125 (|van der Vaarj Il998l p. 198). Finally, in the Gaussian case, A/^ = 

l/(2v^). 
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E Appendix: Proof of Corollary |2] 

Let us first prove that A is a diagonal matrix such that Aj- J. = aj^ /{2^/tz). LetD = (djc/)i<)c/<K = 
C-^ then 

where O is the c.d.f. of a standard Gaussian random variable and Xj , = X^ ./tr, is a standard 
Gaussian random variable. Since <l>(x) —1/2 = X]p>i(ap/p!)^p(^)'"where a^ = 'E[(^{Z)Hp{Z)], 
Z is a standard Gaussian random variable and Hp is the pth Hermite polynomial with leading 
coefficient equal to one (Hi(x) = x, H2{x) = x^ — 1, ...). By applying Mehler's formula, one ob- 
tains E[<l>(Xxit)Xi ,] = iXiC](j/{crj(Crj), where oci = E[Z'I>(Z)] = J^ (p^{x)dx, using an integration 
by parts. Thus, Aj-^^ = (7^7 /{'2.y/n)J^j^id(jCj^i( = a^ /{ly/Ti)ls^n. The expression given for 
I!, can be obtained similarly and is well-known in the literature as it provides t he link between 
the Spearman correlation and the usual correlation coefficients l|KruskalLll958|) . Regarding the 
low^er bound, first note that 

ARE= (47T)-ij'diag(D-f\...,cr-i)(4E-i)diag(t7fi,...,(7^i)J/j'C-ij 

> (47r)-Xi, (diag((7-i,. . .,(7-i)(4E-i)diag(c7-i,. . .,(7^1)C) 

= (47r)^V^2^Ai^in(C)/Ai„ax(S/4) . 

From the expression of E, it is easily checked that Ej-^ < iQ^^/Scr^T a^ \ < \Ci^^e\/ 3(7^^^, which 
gives the second result. 

F Appendix: Proof of Theorem |4] 

We start by proving I l9t w hen £„ is replaced by Z, in ((T7t . For this, w^e shall verify the assump- 



19681 Theorem 15.6): the convergence of the finite-dimensional distribu- 



tions of l|Billingsley 
tions: 



V„([nfiJ)'E-iv„([nfiJ),...,V„([nfpJ)'E-iv„([nfpj: 

' K K ' 



d 



J2Blih),...,'£Bl{tj,) , forO<fi <...<tp<l, «^cx>, (36) 
\k=l k=l J 



and the tightness criterion for the process: 

{Y„{[nt\yZ-^Y„{[nt\); < t < l} , 

where [xj denotes the integer part of x. Let nj = L'^'^iJ' with ti in (0,1). In the same way as 
in Appendix El as V„j( ( ■ ) only differs from Lr„ ^ ( • ) by a normalising factor, we can prove that 

K,)c("i) = K,*:(wi) + Op(l)/ with < ni < n and 

Vn,k{ni) = -Y/2 E Kki^i,k) sTT" E^u(X;,/t) / 

where hi^i^{x) = 2Fj-(x) — 1 and that 

E[K,,c(«i)K,/c'("i)] ^4fi(l-ti)Cov(F^(Xi,^),F;t'(Xi,/c')). asM^oo. (37) 
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Let M2 = \_nh\- Since 1 < n^ < n2 < n, tii/n ^ fj e (0/1)/ and n2/n -^ t2 & (0,1) we get 



nVn,k{ni)V„,k'{n2)]='E 



Ml 

„3/2 



E ^i,^(X;, 



i,k) 



/=«i+i 



« — «i 



"1 



E^u(X,-;c) 



/=i 



"2 

^,3/2 



E Kk'i^j.k 



j=n2+l 



n — n2 



"2 



E^u'(^a') 



i=i 



• (38) 



By decomposing the interval [mj + 1, m] (resp. [1, M2]) into [«i + 1, n2] and [^2 + 1/ w] (resp. [1, Wi] 
and [ni + I/M2]) and developing the expression, we obtain 



n%,ki"i)V„,t{n2)] 
E 



{n-ni){n-n2) ^. ,„ ., ,„ . ni{n-n2) ^ , /^ ^) ^v \ 
^ -\ -)^h,k{^i,k)h,k'{Xi^k') -^ E h,kiXi,k)h,k'iXi,k') 



J=l 



;="i+i 



H 3- E h,k{^i,k)h,k'i^i,k') 

j=n2+l 

n-[{n — ^2) 



^kk' — >ti{l-t2)'E.,,,,,,asn^oo. (39) 



With ^ and ll39|, we obtain 

/^V„(«i) 

VVn(«2); 

which is equivalent to 



Vh(«i)\ rf, ,r^nY fi(l-fi)S 



A^ 0; 



fi(l-t2)S 



fi(l-f2)S 



^2(1-^2)2: 



V„(ni)\ rf /S2 \ /B(fO 
V„(n2)j I Zij lB(f2);' 



(40) 



(41) 



where B(t) = (Bj (f ), . . . , 8^(0 )/ < f < 1. For the sake of clarity and without loss of generaUty, 
is thus proved in the case p = 2 by applying the continuous function 



I ''^l''^^ 1 , where xi,X2 e IR^- 



I ; I , wnere xi,X2 t Jiv . (42) 

\X2X2J 

In the following, w^e check the tightness condition, that is, for < fi < f < ^2 < 1/ w^e show that 

\V„{[nt\)^-^Y„i[nt\)-Yn{[nti\)-L-^Yn{[nh\)\^ 

x\Y„{[nt2\)^-^Y„{[nt2\)-Vn{[nt\)Z-^Vn(.[nt\)\^] <C\t2-h\^, (43) 



E 



where C is a positive constant. Let x„(f) = {x„^i{t), . . .,x„^K(t))' = AYnilnt] ), where A = 11 2, 
whose (p, q)th entry is denoted by flp,q. The l.h.s. of | |43|| can thus be rewritten as 



E 



|x;(Ox„(f)-x;(fi)x„(fi)r|xUf2)x„(f2)-x;(f)x„(oi 



= E 



E{4.(0-4.(fi)} 

k=l 



E{^lk'ih)-xl,k'm 

k'=l 



■ (44) 
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Note that 

K K 

k=l k=l 

= E [fl^k,p[Vn,p{\.nt\)-Vn,p{\.nh\)]\ [ E «;cy[v;y(L«fJ) + Ky(L«fiJ)] 

k=i Vp=i / \p'=i 

= E ^p,v'[^nAl^t\)-Vn,p{[nh\)][Vn,f{[nt\) + V^^j,,{[nh\) 

p,p'=l 



, (45) 



where b^y = Y^^^i ^k,v^k,v' is the (p, p')th element of the matrix B = A^ — IL '^ . Similarly 

K K 

T.{<,k'i^2)-xl,,{t)}= ^ \,'[Vn,,{[nt2\)-Vn,,{\_nt\)] \K,,'{\.nh\) + %^^,{\_nt\) 

k'=l q,q'=l 

(46) 
Using the notations £ = [nt], i-y = [nti\ and £2 = I'^^l]/ ^rid decomposing the interval [l,n] 
into [1, £1], [£i + 1, £], [£ + 1, £2] and [£2 + 1, n], we get 



Vn,pW -Vn,pm = ^-3^ itKpiX,,p) 



M-(^-£l) 



^3/2 









(47) 



and 



Vn,p{£) + V„,p{£l) = 



' + £1 

„3/2 



E'^l,p(^'>) 



U = l 



"-<'^^''^i: v«,; 



^3/2 



v!=£i+l 



-37^ E ^i,p(x.,p) 



i3/2 



I 

- E Kpi^i,p 



./=/'2 + l 



(48) 



with similar results for the terms of ( |46l l. Equation ||44|| is the expected value of the product of 
the squares of | |45t and ( |46t . Using Cauchy-Schwarz inequality, (|44|l is bounded above by the 
sum of several terms, obtained by inserting l|47|l and ( |48t in ( |45t and | |46l l, respectively. Among 
these terms, we consider the case of: 



ci E E ^lA. 

p,p'=l q,ij' = l 



2 .2 (n-{£-£im£ + £inn-{£2-£)nh + £)^ 



.12 



X E 



E ^l,pi^i,p] 



Y^h,p{Xi,p) 



! = 1 



h 



E '^i,p(^<,f 



;=f+l 



E Kp(^i,p) 

1=4+1 



(49) 



Using the independence of (X,i-)i<,<„, the expected value in l(49|l can be separated into the 
product of four expected values, and thus can be bounded by 



{£ - 4)4 (£2 -£){n- £2)1^^ < n\£2 - 4)V3*. 



(50) 



Equation l|49)l is thus bounded above by a quantity proportional to {£2 — £\)^ /rp- = ([^^2] ~ 
lnti\)^/rp-. All the terms appearing in the expansion of (|44t can be treated similarly. This 
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completes the proof of ( |43t and thus ensures that 

sup V„i[nt\yi:-^Y„{[nt\) -^ sup J^ ^i^)' " ^ ~ • (51) 

0<f<l 0<t<lA:=l 

In order to prove l ISTI l when Z is replaced by £,;, it enough to prove that supg^^^j | V„ ( [nt] )'(S~^ - 
t-^)Y„{[nt\)\ =Op(l). Note that 

|V„(LntJ)'(S-i-£,7i)V„([niJ)| < l|£-i-E-il| sup l|V„(L«tJ)f , 

0<t<l 

where £~^ — > E"^ and supg^j^j II V,!( [n^J ) \\^ = Op(l), by l ISTJ I, which concludes the proof. 
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